Skip to main content

Reply to this post | Go Back
View Post [edit]

Poster: WikiSourcerer Date: Jun 15, 2013 5:30pm
Forum: texts Subject: OCR questions

I am wondering if there is a way to get the same OCR routine(s) in play during the typical PDF-upload to DjVu-derivative-creation to run if you just uploaded a DjVu file instead? If I'm following the the derive log right; an uploaded PDF has its pages converted to .jp2 files, each .jp2 is processed using OCR to create an XML file that amounts to the "text layer", later on the DjVu variant gets created and then that previous XML is used to embed the detected text into the new DjVu.

I guess I'm asking if the same OCR step(s) are possible but using an uploaded .DjVu file as the "core" file rather than some other typical image (.jpgs etc.) or doc (.pdf etc.) file type. TIA.

Reply to this post
Reply [edit]

Poster: Jeff Kaplan Date: Jun 15, 2013 10:10pm
Forum: texts Subject: Re: OCR questions

the derived formats from original files is at the abbyy OCR file is not derived from djvu.